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Abstract 

Decision theory formally solves the problem of 
rational agents in uncertain worlds if the true 
environmental probability distribution is known. 
Solomonoff's theory of universal induction formally 
solves the problem of sequence prediction for 
unknown distribution. We unify both theories and 
give strong arguments that the resulting universal 
AI£ model behaves optimal in any computable 
environment. The major drawback of the AI£ 
model is that it is uncomputable. To overcome this 
problem, we construct a modified algorithm AI^", 
which is still superior to any other time t and space 
I bounded agent. The computation time of AI£" is 
of the order t-2 l . 
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1 Introduction 

The most general framework for Artificial Intelligence is 
the picture of an agent interacting with an environment 
RN95]. If the goal is not pre-specified, the agent has 

If 



to learn by occasional reinforcement feedback |SB98|. 
the agent shall be universal, no assumption about the en- 
vironment may be made, besides that there exists some 
exploitable structure at all. We may ask for the most 
intelligent way an agent could behave, or, about the op- 
timal way of learning in terms of real world interaction 
cycles. Decision theory formally^ solves this problem only 

1 With a formal solution we mean a rigorous mathematically def- 
inition, uniquely specifying the solution. For problems considered 
here this always implies the existence of an algorithm which asymp- 
totically converges to the correct solution. 



if the true environmental probability distribution is known 
(e.g. Backgammon) |Bel57t |BT9r| . [ |Sol64 |Sol78| formally 
solves the problem of induction if the true distribution is 
unknown but only if the agent cannot influence the envi- 
ronment (e.g. weather forecasts) ]LV97 |. We combine both 
ideas and get a parameterless model AI^ of an acting agent 
which we claim to behave optimally in any computable en- 
vironment (e.g. prisoner or auction problems, poker, car 
driving) . To get an effective solution, a modification AI£ tz , 
superior to any other time t and space I bounded agent, 
is constructed. The computation time of AI£ W is of the 
order t ■ 2 l . The main goal of this work is to derive and 
discuss the AI£ and the Al£ tl model, and to clarify the 
meaning of universal, optimal, superior, etc. Details can 



be found in [HutOOb] 



2 Rational Agents &; Sequential 
Decisions 

Agents in probabilistic environments: A very gen- 
eral framework for intelligent systems is that of rational 
agents [ R,N95[ |. In cycle k, an agent performs action yk^Y 
(output word) which results in a perception itGl (input 
word), followed by cycle k+1 and so on. If agent and en- 
vironment are deterministic and computable, the entan- 
glement of both can be modeled by two Turing machines 
with two common tapes (and some private tapes) contain- 
ing the action stream yvyiyz--- and the perception stream 
X\XiX?,... (The meaning of Xk = x' k rk is explained in the 
next paragraph): 
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p is the policy of the agent interacting with environ- 
ment q. We write p{x <k ) — Dx-.k to denote the output 
yi:k=yi—Dk of the agent p on input x <k = x±...Xk-i and 
similarly q(yuk) = %i-.k for the environment q. We call 
Turing machines p and q behaving in this way chronolog- 
ical. In the more general case of a probabilistic environ- 
ment, given the history yx <k y k = yiXi...y k -ix k -iyk, the 
probability that the environment leads to perception x k in 
cycle k is (by definition) ^(yx< k yx_ k ). The underlined argu- 
ment x k in yit is a probability variable and the other non- 
underlined arguments yx <k y k represent conditions. We 
call probability distributions like /i chronological. 



stationarity, nor the Markov property, nor complete ac- 
cessibility of the environment. Every state occurs at most 
once in the lifetime of the system. As we have in mind 
a universal system with complex interactions, the action 
and perception spaces Y and X are huge (e.g. video im- 
ages), and every action or perception itself occurs usually 
only once in the lifespan m of the agent. As there is no 
(obvious) universal similarity relation on the state space, 
an effective reduction of its size is impossible, but there 
is no principle problem in determining y k as long as /i is 
known and computable and X , Y and m are finite. 



The AI/i Model: The goal of the agent is to maximize 
future rewards, which are provided by the environment 
through the inputs x k . The inputs x k = x' k r k are divided 
into a regular part x' k and some (possibly empty or de- 
layed) reward r k . The /x-expected reward sum of future 
cycles k to m with outputs y k:m = y\ m generated by the 
agent's policy p can be written compactly as 



V*(yb 



<k) 



E 



)KiP : <kt&k:m)i (!) 



where m is the lifespan of the agent, and the dots above 
yx <k indicate the actual action and perception history. 
The ^(-expected reward sum of future cycles k to m with 
outputs yi generated by the ideal agent, which maximizes 
the expected future rewards is 



Reinforcement learning: Things dramatically change 
if /i i s unknown. Reinforcement learning algorithms 



|LK96, 3B9?, BT96| 



are commonly used in this case to 
learn the unknown /i. They succeed if the state space is 
cither small or has effectively been made small by gen- 
eralization or function approximation techniques. In any 
case, the solutions are either ad hoc, work in restricted 
domains only, have serious problems with state space ex- 
ploration versus exploitation, or have non-optimal learning 
rate. There is no universal and optimal solution to this 
problem so far. In the Section ^ we present a new model 
and argue that it formally solves all these problems in an 
optimal way. The true probability distribution \x will not 
be learned directly, but will be replaced by a universal 
prior £, which is shown to converge to ji in a sense. 
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(r k +...+r m )ii(yx <k yx k:m ), 3 



(2) 

i.e. the best expected credit is obtained by averaging over 
the Xi and maximizing over the yi. This has to be done 
in chronological order to correctly incorporate the depen- 
dency of Xi and yi on the history. The output y k , which 
achieves the maximal value defines the Alfj, model: 



y k := maxargy^ 

Vk Xk 



. max 



E 



(r k +...+r m )n(y± <k yx k:m ). (3) 



The AI/i model is optimal in the sense that no other policy 
leads to higher ^-expected reward. A detailed derivation 
and other recursive and functional versions can be found 



in [HutOOt], 



Sequential decision theory: Eq. (g) is essentially an 
Expectimax algorithm/sequence. One can relate (|^) to 
the Bellman equations [Bcl57] of sequential decision the- 
ory by identifying complete histories yx <k with states, 
A i (y E <fe2i£fc) with the state transition matrix, V*(yx <k ) 
with the value of hist ory /stat e yx< kl and y k with the ac- 
tion in cycle k |RN95| , |Hut00b| . Due to the use of complete 
histories as state space, the Alfi model neither assumes 



Algorithmic Complexity 
Universal Induction 



and 



The problem of the unknown environment: We 

have argued that currently there is no universal and op- 
timal solution to solving reinforcement learning problems. 



On the other hand, [3ol64| defined a universal scheme of 
inductive inference, based on Epicurus' principle of mul- 
tiple explanations, Ockham's razor, and Bayes' rule for 
conditional probabilities. For an excellent introduction 
one should consult the book of [LV97 . In the following 



we outline the theory and the basic results. 



Kolmogorov complexity and universal probabil- 
ity: Let us choose some universal prefix Turing ma- 
chine U with unidirectional binary input and output 
tapes and a bidirectional working tape. We can then 
define t he ( conditio nal) p refix Kolmogorov complexity 
Cha75| , |G74 |Kol65| , [Lev74| as the length I of the shortest 
program p, for which U outputs the binary string x = xi :n 
with Xi <E{0, 1}: 



K(x) 



min{l{p) 
p 



U(p) = x}, 
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and given y 



K(x\y) 



mm{l(p) : U(p,y) 
v 



x}. 



The universal semimeasure £(x) is defined as the proba- 
bility that the output of U starts wit h x when pr ovided 
with fair coin flips on the input tape | Sol64 , 3ol78|. It is 
easy to see that this is equivalent to the formal definition 



E 



-Hp) 



(4) 



p : 3lj:U(p)—xuj 



where the sum is over minimal programs p for which U out- 
puts a string starting with x. U might be non-terminating. 
As the short programs dominate the sum, £ is closely re- 
lated to K(x) as = 2- K ^+°( K{l ^\ £ has the impor- 



tant universality property [Sol64| that it dominates every 
computable probability distribution p up to a multiplica- 
tive factor depending only on p but not on x: 



-K(p)-0(1) 



(5) 



The Kolmogorov complexity of a function like p is defined 
as the length of the shortest self-delimiting coding of a 
Turing machine computing this function. £ itself is not 
a probability distribution^. We have £(xO)+£(xl) <t;(x) 
because there are programs p, which output just x, nei- 
ther followed by nor f . They just stop after printing x 
or continue forever without any further output. We will 
call a function p > with the properties p(e) < 1 and 
Pfehn) <p{3L< n ) a semimeasure. £ is a semimeasure 
and ([5]) actually holds for all enumerable semimeasures p. 

Universal sequence prediction: (Binary) sequence 
prediction algorithms try to predict the continuation x n 
of a given sequence x\...x n —\. In the following we will as- 
sume that the sequences are drawn from a probability dis- 
tribution and that the true probability of a string starting 
with X \ • • • X fi IS //(^ 1:n ). The probability of x n given x<„ 
hence is p{x< n x n ). If we measure prediction quality as 
the number of correct predictions, the best possible sys- 
tem predicts the x n with the highest probability. Usually 
p is unknown and the system can only have some belief p 
about the true distribution p. Now the universal proba- 
bility £ comes into play: [ Sol7S[] has proved that the mean 
squared difference between £ and p is finite for computable 
p: 



M2</c)(£0<fcZfc) - v{x<kX k )Y 



(6) 



k=l x 1:k 



< In2-iC0)+O(l). 



2 It is pnssi hle tn normalis e £ to a probability distribution as has 
been done in S0JJ8 , Hut99 by giving up the enumerability of £. 



A simplified proof can be found in [Hut99|. So the differ- 
ence between £,{x <n x_ n ) and p(x <n x n ) tends to zero with 
p probability 1 for any computable probability distribu- 
tion p. The reason for the astonishing property of a single 
(universal) function to converge to any computable proba- 
bility distribution lies in the fact that the set of /x-random 
sequences differ for different p. The universality property 
(^|) is the central ingredient for proving (^J). 

Error bounds: Let SPp be a probabilistic sequence pre- 
dictor, predicting x n with probability p(x <n x n ). If p 
is only a semimeasure the SPp system might refuse any 
output in some cycles n. Further, we define a deter- 
ministic sequence predictor SP0 p predicting the x n with 
highest p probability. Q p {x <n x n ) :— 1 for one x n with 
p{x <n x n ) > p(x <n x/ n ) \/x' n and <d p (x <n x n ) :=0 otherwise. 
SPO M is the best prediction scheme when p is known. If 
p{x <n x n ) converges quickly to p(x< n x n ) the number of 
additional prediction errors introduced by using Q p in- 
stead of 9 M for prediction should be small in some sense. 
Let us define the total number of expected erroneous pre- 
dictions the SPp system makes for the first n bits: 



E np := EE M£l:k)(l- P( x <kX k )). 



(7) 



fc=l 



The SP0 M sy stem is best in the sense that E n Q < E np 
for any p. In [But9£] it has been shown that SP0£ is not 
much worse 



E n Be " E np < H 



iE np H 



H 2 



(8) 
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1 with 



Bounds (|6|) and lold for both definitions. 



with H < hx2-K(p) + 0{l) 
and the tightest bound for p = Q p . For finite E^q^ 
is finite too. For infinite £ l 00 e M , E nBi /E nBti ™-^? 
rapid convergence. One can hardly imagine any better 
prediction algorithm as SP8 f witho ut extra knowledge 
about the environment. In HutOOa ] , (^|) and (g) have 
been generalized from binary to arbitrary alphabet and 
to general loss functions. Apart from computational as- 
pects, which are of course very important, the problem of 
sequence prediction could be viewed as essentially solved. 

4 The Universal AI£ Model 

Definition of the AI£ Model: We have developed 
enough formalism to suggest our universal AI£ model. 
All we have to do is to suitably generalize the universal 
semimeasure £ from the last section and to replace the 
true but unknown probability p in the Alp model by this 
generalized £. In what sense this AI£ model is universal 
and optimal will be discussed thereafter. 

We define the generalized universal probability £ AI as 
the 2~ 1 ^ weighted sum over all chronological programs 



3 



(environments) q which output x\-k, similar to (Q) but 
with yi : k provided on the "input" tape: 



Cfe :fc ) := J2 2 ~ li9) - 
q-q(vi.k)=x 1:k 



(9) 



Replacing ji by £ in (||) the iterative AI£ system outputs 



y k := maxarg^] ... maxV\cA ; + ... + c m )£(yx <k yx k . r 



(10) 



in cycle k given the history yx <k . 



(Non)parameters of AI£: The AI£ model and its be- 
haviour is completely defined by (g) and (|l(]) . It (slightly) 
depends on the choice of the universal Turing machine. 
The AI£ model also depends on the choice of X and Y, 
but we do not expect any bias when the spaces are chosen 
sufficiently large and simple, e.g. all strings of length 2 16 . 
Choosing IV as word space would be ideal, but whether 
the maxima (or suprema) exist in this case, has to be 
shown beforehand. The only non-trivial dependence is on 
the horizon m. Ideally we would like to chose m — oo, but 
there are several subtleties to be discussed later, which 
prevent at least a naive limit m — > oo. So apart from 
m and unimportant details, the AI£ system is uniquely 
defined by ([!(]) and (JoJ) without adjustable parameters. 
It does not depend on any assumption about the envi- 
ronment apart from being generated by some computable 
(but unknown!) probability distribution as we will see. 

£ is only a semimeasure: One subtlety should be men- 
tioned. Like in the SP case, £ is not a probability distri- 
bution but still satisfies the weaker inequalities 



(11) 



Note, that the sum on the l.h.s. is not independent of 
y n unlike for the chronological probability distribution 
fi. Nevertheless, it is bounded by something (the r.h.s) 
which is independent of y n . The reason is that the sum 
in (j^) runs over (partial recursive) chronological functions 
only and the functions q which satisfy q(yi:n) — x <n x' n 
for some x' n S X are a subset of the functions satisfying 
l{y<n) = x <n . We will in general call functions satisfying 
( |ll[ ) chronological scmimeasures. The important point is 
that the conditional probabilities (JsJ) are < 1 like for true 
probability distributions. 

Universality of £ j4/ : It can be shown that £ j4/ defined 
in (|^) is universal and converges to [i A1 analogously to 
the SP case (||) and (^). The proofs are generalizations 
from the SP case. The actions y are pure spectators and 
cause no difficulties in the generalization. This will change 



when we analyze error/ value bounds analogously to (£). 
The major difference when incorporating y is that in (5), 
U (p) = xlu produces strings starting with x, whereas in 
@ we can demand q to output exactly n words X\-. n as 
q knows n from the number of input words y\...y n . £ AI 
dominates all chronological enumerable semimeasures 



-K(p)-0{ 



(12) 



£ is a universal element in the sense of ([12]) in the set of 
all enumerable chronological semimeasures. This can be 
proved even for infinite (countable) alphabet [ HutOObfl . 



Convergence of £ to [i A : From (J12J) one can show 



Tb 

Y Km<k) i^{w<kyxk) - ^{yx<kyxk) 



fe=l x 1 . k 



< ln2.Jf(/i) + 0(l) 



(13) 



for computable chronological measures fi. The main com- 
plication in generalizing (|^) to (|l3| ) is the generalization 
to non-binary alphabet [Hut00a|. The y are, again, pure 
spectators. (|l3|) shows that the /^-expected squared differ- 
ence of \x and £ is finite for computable fi. This, in turn, 
shows that £,{yx<kyx_k) converges to fJ-(yx<:kyx k ) for k-^oo 
with /i probability 1. If we take a finite product of £ s and 
use Bayes' rule, we see that also i{yx< k yx_ k . k+r ) converges 
to Li(yx< k yx k k + r )- More generally, in case of a bounded 
horizon h k = m k ~k+l < h max <oo, it follows that 



£(yx<kyx k :r 



Kyx<kyx k :r 



(14) 



Convergence is only guaranteed for one (e.g. the true) 
i/o sequence yt<kif^k:m k but not for alternate sequences 



yx<kyxk:m k - Since ( |10| ) takes an average over all possible 
future actions and perceptions yxk-.m k not only the one 
which will finally occur, (|l4|) does not guarantee y k ^y k - 
This gap is already present in the SP9 P models, but nev- 
ertheless good error bounds could be proved. This gives 
confidence that the outputs y k of the AI£ model ([k]) could 
converge to the outputs y k of the Alfi model (||), at least 
for a bounded horizon h k . The problems with a fixed hori- 
zon m k —m and especially m— >oo will be discussed later. 

Universally optimal AI systems: We want to call 
an AI model universal, if it is //-independent (unbiased, 
model-free) and is able to solve any solvable problem and 
learn any learnable task. Further, we call a universal 
model, universally optimal, if there is no program, which 
can solve or learn significantly faster (in terms of interac- 
tion cycles). As the AI£ model is parameterless, £ con- 
verges to fj, in the sense of (|l3|,[l4]), the Al/i model is itself 
optimal, and we expect no other model to converge faster 
to AI/i by analogy to SP (|), 

we expect Alt; to be universally optimal. 



4 



This is our main claim. Further support is given in for the error excess E n Q ( — E np . Unfortunately, simple 

value bounds for AI£ or any other AI system in terms of V* 
analogously to the error bound (||) can not hold [ HutOOr. ] . 
We even have difficulties in specifying what we can expect 
to hold for AI£ or any AI system which claims to be uni- 
versally optimal. In SP, the only important property of \x 
for proving error bounds was its complexity K(fx). In the 
AI case, there are no useful bounds in terms of K(fi) only. 
We either have to study restricted problem classes or con- 
sider bounds depending on other propert ies of /i, rather 
than on its complexity only. In HutOOb the difficulties 
are exhibited by two examples. Several concepts, which 
might be useful for proving value bounds are introduced 
and discussed. They include forgetful, relevant, asymptot- 
ically learnable, farsighted, uniform, (generalized) Marko- 
vian, factorizable and (pseudo) passive fi. They are ap- 
proximately sorted in the order of decreasing generality 
and are called separability concepts. A first weak bound 
for passive \x is proved. 



HutOOb | by a detailed analysis of the behaviour of AI£ for 
various problem classes, including prediction, optimiza- 
tion, games, and supervised learning. 

The choice of the horizon: The only significant arbi- 
trariness in the AI£ model lies in the choice of the lifespan 
m or the = m]~ — k + l if we allow a cycle dependent m. 
We will not discuss ad hoc choices of hk for specific prob- 
lems. We are interested in universal choices. The book of 
thoroughly discusses the mathematical problems 



Ber95 



regarding infinite horizon systems. 

In many cases the time we are willing to run a system 
depends on the quality of its actions. Hence, the lifetime, 
if finite at all, is not known in advance. Exponential dis- 
counting rk — > ffe • 7 fc solves the mathematical problem of 
in — ► oo but is no real solution, since an effective hori- 
zon h ~ In — has been introduced. The scale invariant 

7 

discounting r& — > • k~ a has a dynamic horizon h ~ k. 
This choice has some appeal, as it seems that humans of 
age k years usually do not plan their lives for more than 
the next ~ k years. From a practical point of view this 
model might serve all needs, but from a theoretical point 
we feel uncomfortable with such a limitation in the hori- 
zon from the very beginning. A possible way of taking the 
limit vri — ► oo without discounting and its problems can be 



found in jHutOObfl . 

Another objection against too large choices of mk is 
that C(y c <fe?>E.fc:jn fc ) has been proved to be a good approx- 
imation of fi{yx<kyx_km k ) only for k^>h^, which is never 
satisfied for = m — > oo. On the other hand it may 
turn out that the rewards rv for k'^$>k, where £ may no 
longer be trusted as a good approximation of /i, are in a 
sense randomly disturbed with decreasing influence on the 
choice of y^ . This claim is supported by the forgetfulness 
property of £ (see next section) and can be proved when 



restricting to factorizable environments [ HutOOt ] 



We are not sure whether the choice of mk is of marginal 
importance, as long as mfc is chosen sufficiently large and 
of low complexity, raj. — 2 2 for instance, or whether 
the choice of nik will turn out to be a central topic for 
the AI£ model or for the planning aspect of any universal 
AI system in general. Most if not all problems in agent 
design of balancing exploration and exploitation vanish by 
a sufficiently large choice of the (effective) horizon and/or 
a sufficiently general prior. We suppose that the limit 
rrifc — > oo for the AI£ model results in correct behaviour for 
weakly separable (defined in the next section) /i, and that 
even the naive limit m— ►oo may exist. 

Value bounds and separability concepts: The val- 
ues V* associated with the Alp systems correspond 
roughly to the negative error measure —E np of the SPp 
systems. In the SP case we were interested in small bounds 



5 Time Bounds and Effectiveness 

Non-effectiveness of AI£: £ is not a computable but 
only an enumerable semimeasure. Hence, the output yu 
of the AI£ model is only asymptotically computable. AI£ 
yields an algorithm that produces a sequence of trial out- 
puts eventually converging to the correct output y^, but 
one can never be sure whether one has already reached it. 
Besides this, convergence is extremely slow, so this type 
of asymptotic computability is of no direct (practical) use. 
Furthermore, the replacement of £ by time-limited versions 
LV91 , LV97 |, which is suitable for seque nce pred iction, 
has been shown to fail for the AI£ model [ HutOOb |. This 
leads to the issues addressed next. 

Time bounds and effectiveness: Let p be a policy 
which calculates an acceptable output within a reason- 
able time t per cycle. This sort of computability assump- 
tion, namely, that a general purpose computer of sufficient 
power and appropriate program is able to behave in an in- 
telligent way, is the very basis of AI research. Here it is 
not necessary to discuss what exactly is meant by 'reason- 
able time/intelligence' and 'sufficient power'. What we 
are interested in is whether there is a computable version 
AI£* of the AI£ system which is superior or equal to any 
program p with computation time per cycle of at most t. 

What one can realistically hope to construct is an AI£*' 
system of computation time c-t per cycle for some constant 
c. The idea is to run all programs p of length <l;=l(p) 
and time < t per cycle and pick the best output in the 
sense of maximizing the universal value Vf. The total 

computation time is c-t with c m 2 l . Unfortunately V£ 
can not be used directly since this measure is also only 
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semi-computable and the approximation quality by using 
computable versions of £ given a time of order c-t is crude 
LV97, Hut00b|. On the other hand, we have to use a 
measure which converges V^* for t, I— >oo, since the AI£*' 
model should converge to the AI£ model in that case. 

Valid approximations: A solutio n satisfying the above 
conditions is suggested in [HutOOb]. The main idea is 
to consider extended chronological incremental policies p, 
which in addition to the regular output y k rate their own 

output with w k . The AI£ tz model selects the output 
Vk = y\ of the policy p with highest rating w k . p might 
suggest any output y k but it is not allowed to rate itself 
with an arbitrarily high w k if one wants w k to be a reliable 
criterion for selecting the best p. One must demand that 
no policy p is allowed to claim that it is better than it 



actually is. In [Hut00b| a (logical) predicate VA(p), called 
valid approximation, is defined, which is true if, and only 
if, p always satisfies w k < V^(yx < k), i.e. never overrates 
itself. V^{yx<k) is the £ expected future reward under 
policy p. Valid policies p can then be (partially) ordered 
w.r.t. their rating w k . 



8. Receive input ik from the environment. 

9. Begin next cycle: k:—k+l, goto step |[ 

Properties of the p* algorithm: Let p be any ex- 
tended chronological (incremental) policy of length l(p)< l 
and computation time per cycle t(p) < t, for which there 
exists a proof of VA(p) of length < lp. The algorithm 
p* , depending on I, t and lp but not on p, has always 
higher rating than any such p. The setup time of p* is 
t S etup{p*) = 0{lp -2 lp ) and the computation time per cy- 
cle is tcydeij)*) — 0(2} -t). Furthermore, for t,l — ► oo, p* 
converges to the behavior of the AI£ model. 

Roughly speaking, this means that if there exists a com- 
putable solution to some AI problem at all, then the ex- 
plicitly constructed algorithm p* is such a solution. Al- 
though this claim is quite general, there are some limi- 
tations and open questions, regarding the setup time re- 
garding the necessity that the policies must rate their own 
output, regarding true but not efficie ntly prov able VA(p), 
and regarding "inconsistent" policies [HutOOt]. 



6 Outlook &; Discussion 



The universal time bounded AI£" system: In the 

following, we describe the algorithm p* underlying the uni- 
versal time bounded AI£' ; system. It is essentially based 
on the selection of the best algorithms pi out of the time 
t and length I bounded policies p, for which there exists a 
proof P of VA(p) with length <lp. 

1. Create all binary strings of length lp and interpret 
each as a coding of a mathematical proof in the same 
formal logic system in which VA(-) has been formu- 
lated. Take those strings which are proofs of VA(p) 
for some p and keep the corresponding programs p. 

2. Eliminate all p of length > I. 

3. Modify all p in the following way: all output w k y k is 
temporarily written on an auxiliary tape. If p stops 
in t steps the internal 'output' is copied to the output 
tape. If p does not stop after t steps a stop is forced 
and w k = and some arbitrary y k is written on the 
output tape. Let V be the set of all those modified 
programs. 

4. Start first cycle: fc:=l. 

5. Run every p G V on extended input yr<fc, where 
all outputs are redirected to some auxiliary tape: 
p(yx < k) — u^yf-.-w^y^.. This step is performed in- 
crementally by adding yik-i for k > 1 to the input 
tape and continuing the computation of the previous 
cycle. 

6. Select the program p with highest rating w k : p^. := 



This section contains some discussion and remarks on oth- 
erwise unmentioned topics. 

Value bounds: Rigorous proofs of value bounds for the 
AI£ theory are the major theoretical challenge - general 
ones as well as tighter bounds for special environments 
ji. Of special importance are suitable (and acceptable) 
conditions to fj,, under which yk and finite value bounds 
exist for infinite Y, X and m. 



Scaling AI£ down: [HutOOt] shows for several exam- 
ples how to integrate problem classes into the AI£ model. 
Conversely, one can downscale the AI£ model by using 
more restricted forms of £. This could be done in a similar 
way as the theory of universal induction has been down- 
scaled with many insights to the Minimum Description 
Length principle [LV92. Ris89[ or to the domain of finite 
automata [FMG92|. The AI£ model might similarly serve 
as a super model or as the very definition of (universal un- 
biased) intelligence, from which specialized models could 
be derived. 



maxarg w p k . 



7. Write yk'-=y^. k to the output tape. 



Applications: | HutOOb | shows how a number of AI 
problem classes, including sequence prediction, strategic 
games, function minimization and supervised learning fit 
into the general AI£ model. All problems are claimed to 
be formally solved by the AI£ model. The solution is, how- 
ever, only formal, because the AI£ model is uncomputable 
or, at best, approximable. First, each problem class is for- 
mulated in its natural way (when ^p roblcm ig known) and 



G 



then a formulation within the AI/i model is constructed 
and their equivalence is proven. Then, the consequences 
of replacing \x by £ are considered. The main goal is to 
understand how the problems are solved by AI£. For more 
details see [HutOOt]. 



Implementation and approximation: The AI£*' 
model suffers from the same large factor 2 l in computation 
time as Levin search for inversion problems |Lev73 , Lev84 ] . 
Nevertheless, Levin search has been implemented and suc- 
cessfully applied to a variety of problems |Sch97 , SZW97]. 
Hence, a direct implementation of the AI£*' model may 
also be successful, at least in toy environments, e.g. pris- 
oner problems. The AI£*' algorithm should be regarded 
only as the first step toward a computable universal AI 
model. Elimination of the factor 2 l without giving up 
universality will probably be a very difficult task. One 
could try to select programs p and prove VA(p) in a more 
clever way than by mere enumeration. All kinds of ideas 
like, heuristic search, genetic algorithms, advanced theo- 
rem provers, and many more could be incorporated. But 
now we have a problem. 

Computability: We seem to have transferred the AI 
problem just to a different level. This shift has some 
advantages (and also some disadvantages) but presents, 
in no way, a solution. Nevertheless, we want to stress 
that we have reduced the AI problem to (mere) compu- 
tational questions. Even the most general other systems 
the author is aware of, depend on some (more than com- 
plexity) assumptions about the environment, or it is far 
from clear whether they are, indeed, universally optimal. 
Although computational questions are themselves highly 
complicated, this reduction is a non-trivial result. A for- 
mal theory of something, even if not computable, is often 
a great step toward solving a problem and has also merits 
of its own (see previous paragraphs). 

Elegance: Many researchers in AI believe that intelli- 
gence is something complicated and cannot be condensed 
into a few formulas. They believe it is more a combin- 
ing of enough methods and much explicit knowledge in the 
right way. From a theoretical point of view, we disagree 
as the AI£ model is simple and seems to serve all needs. 
From a practical point of view we agree to the following ex- 
tent. To reduce the computational burden one should pro- 
vide special purpose algorithms (methods) from the very 
beginning, probably many of them related to reduce the 
complexity of the input and output spaces X and Y by 
appropriate pre/post-processing methods. 

Extra knowledge: There is no need to incorporate ex- 
tra knowledge from the very beginning. It can be presented 



in the first few cycles in any format. As long as the al- 
gorithm that interprets the data is of size 0(1), the AI^ 
system will 'understand' the data after a few cycles (see 
Hut00b| |). If the environment fi is complicated but extra 
knowledge z makes K(/i\z) small, one can show that the 
bound ( |l~3| ) reduces to hi2-K(p\z) when x\ = z, i.e. when z 
is presented in the first cycle. Special purpose algorithms 
could also be presented in x\ , but it would be cheating to 
say that no special purpose algorithms have been imple- 
mented in AI£. The boundary between implementation 
and training is blurred in the AI£ model. 

Training: We have not said much about the training 
process itself, as it is not specific to the AI£ model and 
has been discussed in literature in various forms and dis- 
ciplines. A serious discussion would be out of place. To re- 
peat a truism, it is, of course, important to present enough 
knowledge x' k and evaluate the system output yk with 
in a reasonable way. To maximize the information con- 
tent in the reward, one should start with simple tasks and 
give positive reward to approximately the better half of 
the outputs yk, for instance. 



The big questions: [HutOOb contains a discussion of 



the "big" questions concerning the mere existence of any 
computable, fast, and elegant universal theory of intelli- 
gence, related to non-computable \x [Pen94| and the 'num- 
ber of wisdom' fl [pha75| , |Cha9l| . 
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